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We  comment  on  the  performance  of  the  Gaussian  discrimi¬ 
nant  function  with  (possibly)  non-Gaussian  underlying  distri¬ 
butions.  An  asymptotic  expression  for  the  probability  of 
error  for  the  Gaussian  case  is  given  with  a  formal  convergence 
proof. 
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I. 


INTRODUCTION 


For  many  practical  problems  in  two  class  pattern  recogni¬ 
tion,  one  has  (reliable  estimates  of)  the  first  two  moments 
of  each  class  (mean  vectors  in  Rn  -  ,  M2  and  covariance 

matrices  -Z^,  12) '  Whether  or  not  the  underlying  distributions 
are  indeed  Gaussian,  one  proceeds  to  apply  the  standard  Gaussian 

hypothesis  test  to  classify  new  data.  More  precisely,  one 

P?  (X) 

uses  the  Gaussian  discriminant  function  h(X)  =  log  ^  ^  , 
where  p^,  p2  are  multivariate  normal  with  the  same  first  two 
moments  as  the  underlying  distributions.  Applying  an  affine 
transformation  to  our  problem  (which  has  no  effect  on  the 
discriminant  h)  that  simultaneously  diagonalizes  Z^  and  Z2 
( Z  j-*  I  ,  Z2~*  A  ,  M2~*  0  ,  M^-*  (d^  ,d2 ,  . .  ,  d^)  with  d^>  0  )  ,  we  have 


(1)  h  (X)  = 


n  r 

E 

Z=1  L 


(x!L~dZ)  2  •  XZ/Xl  +  ln(1/V 


In  this  correspondence,  we  first  present  some  elementary 
inequalities  in  h,  valid  regardless  of  the  class  distributions; 
and  then  we  demonstrate  the  asymptotic  result: 


(2)  P 


error 


a  jhr  f 


* 

e  dx  /with  J  the  divergence^ 


2 


\of  p1#  p2 


1 


for  the  case  of  equal  priors,  Gaussian  distributions,  and  all 
A close  to  1.  We  note  that  the  above  does  not  follow  from 
the  elementary  fact  that,  for  fixed  n,  h(X)-»  a  linear  function 
as  all  A^-*  1;  for  all  A^  may  be  close  to  1  but  the  quadratic 

part  of  h  =  h S  xo  (l-i  )  may  not  approach  0  if  n  becomes  large. 

1  *  '  a£/ 


II.  THE  GAUSSIAN  DISCRIMINANT  FOR  ARBITRARY  CLASS  DISTRIBUTIONS 

Calculating  the  first  moments  of  h  under  each  hypothesis, 
we  have,  regardless  of  the  underlying  distributions: 


(3)  E1(h)  =  %  £ 

£=1 


£  [hj  -  5  *  in(i/x»)] 


(4)  E2(h)  =  is 


n  r 

£  k- 

1=1  L 


1)  +  dj  +  ln(l/Afc) 


] 


Since  Z-l  +  ln(l/Z)>0  for  all  Z>0  ,  we  see  immediately  that 


(5)  E2(h)  >  is  2  d£  =  *5  D< 


1  Y 

Noting  that  the  maximum  value  of  f(Z)  =  1-^-  -  -y-  +  ln(l/Z) 

2  1 

for  Z>0  occurs  at  Z  =  1+y  ,  we  have  f(Z)£l- 


+  In 


(rh)- 


_  _X 


1+Y‘ 


1+Y* '  1+Y2  1+Y2 


Hence 
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(6)  E1(h)  <  -h  £  d^/l+d \ 

2 

which  is  a  -^D  if  each  component  d^  is  small.  Therefore,  in 

^  2  2  2 

many  practical  problems  E2(h)  -  E^(h)^  D  =2^  d^.  D  is 

then  a  first  order  measure  of  the  performance  of  h.  If  n  is 
large,  the  are  close  to  one,  the  d^  are  small,  and  the 
sequence  of  random  variables  x^  is  k  dependent  for  small  k, 
then  we  could  apply  the  central  limit  theorem  and  obtain 
estimates  of  the  error  probability  of  h  by  calculating  VAr-^h) 

and  VAr2(h)  from  sample  data. 

III.  ASYMPTOTIC  APPROXIMATION  TO  ERROR  PROBABILITY 

To  justify  the  claim  in  I,  we  state  and  prove  the  following 
theorem: 

Theorem  :  Let  a  sequence  of  decision  problems,  with  underlying 

i  nj 

Gaussian  distributions  described  by  means  D  ,  0  in  R  and 

covariances  I,  A1,  be  given.  Then,  if  max  |A^-l|  -*0  as 

i<£<ni 

i-oo  , 

Perror  "  7^7  jL  ^  dx  0 

vj1 
2 

for  the  equal  prior  case. 
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Proof :  We  shall  apply  a  central  limit  theorem  for  arrays 

of  random  variables  and  use  the  first  two  moments  of  h1  to 
obtain  an  asymptotic  expression  for  the  error  probability- 
calculating  the  variances  under  each  hypothesis  of  h1,  we 


obtain 

(7)  VAr-^h1) 


(8)  VAr^h1)  -  [(xj-l)2  +  K)2] 

Using  (3),  (4),  (7)  and  (8) ,  and  noting  by  elementary  calculus 
that  ,  .  0  .  .  v 

_ _  2 

l-l/xj  +  ln(l/xj)  (xl-l) 

tt-1)2  _  it H  -  +  2 

Xj-l  +  ln(l/xj) 
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we  have 


and  VAr2(h 


^h1)  j  2E1(h1 

x) j  2E2 (h1 


)  —  + 


Futhermore 


-(oj)2 


(*i)2 


-  l 
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and  HW)  _ 

(i-i)+  In  (1/X\)  —  l/(xj)2  -1/Xj 

V 

imply  that 

E2{hl) 

E1(h1) 

or  equivalently 

E2(h1)/J1  - ►  +  ij 

E1(h1)/J1  — ►  -  %  . 


We  now  proceed  with  the  main  proof.  We  may  assume  (by 

passing  to  subsequences  if  necessary)  that  both  J1  and  P1 

error 

are  convergent  sequences  (possibly  to  +oo  in  the  case  of  J'*')  . 
We  divide  the  argument  into  several  cases: 
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CASE  (1) 


J1— *0 


It  suffices  to  show  that  P  — ►  This  is  actually 

true  in  general.  Consider  any  2  positive  density  functions, 
p, q  ,  on  some  probability  space.  Then,  if  for  some  real  6  >0  , 
there  is  no  measureable  set  whose  q  measure  is  greater  than 
5  and  such  that  on  this  set  q/p>l+6,  it  follows  that 


P  =h 

error 


[/'♦/'] 

Lq<p  q>p  J 


f  q  +  f  p  +  /  p 

q<p  q/p>l+(S  l<q/p<l+6J 
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[/ 

Lq<d 


h\  I  q  +  J  (p/q)q 

-q<p  l<q/p<l+6 


> 


1 

1+6 


/ 

q<p 


q  + 


l+6(  /  q  "  /  <3 

Vq/P>l  q/p>l+6 


> 


1-6 


Hence  if  perror  does  not  approach  % ,  such  a 


—  2(1+6)  * 

6  exits.  But  then  the  divergence  J1(p,q)  = 

J  In (p/q) (p-q)  +  J  ln(q/p) (q-p) 
p?q  q  >p 
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>  [ln(1«>]  [(i-ITs)  5]  ' 


6  In ( 1+6 ) 
1+6 


>  0 


CASE  (2) 


J  *  0 


Let's  rewrite 


=  %  t,  [l<)2  [l-l/xj]  -  2*j  dj]  +  Xt 


where  we  reorder  the  d.  such  that 


di  *  di+i  • 


n . 

l 


Subcase  (a)  sup (  ^  (d^) 


i,  2 


=  +  oo 


Clearly  from  (5)  J  =  +  oo .  Consider  the  (sub-optimal) 
ni 

discriminants  g1  =  ^  dj  .  These  are  normally  distributed 

1 


ni 


n; 


with  means,  (d^)  2  and  0,  and  standard  deviations }\J^2 
1  1 


and 


E  At  (di»2 


One  can  then  find  arbitrarily  large  i 


for  which  g1  has  arbitrarily  small  error  probability.  Since 


h1  is  optimal,  it  has  arbitrarily  small  error  for  these  i 

and  hence,  P1  — ►  0. 

error 
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Subcase  (b) 


sup 


Z  K> 


i4  2 


<  +  oo 


+  h 


We  first  note  that  VAr(h1) — ►  J*0  under  either  hypothesis. 
1  =  ^  [uj)2[l-l/»j]  -  2xj  dj] 

E  [<*j>2[l-1/.’.j]  -  2xj  dj]  +  K  -  fj  +  pj  4  K 

rr  i  <-  -i 


Let  us  rewrite  h' 


n .  +1 
i 


with 


n.  chosen  such  that  n. 
i  i 


oo  but  that 


We 


may  now  apply  a  central  limit  theorem,  for  instance  Corollary 
4.2  on  page  232  of  flj  :  F  or  any  8>0  ,  either  has  variance 
<3,  or  F^  becomes  normal  in  distribution  for  large  i.  This 
follows  from  the  central  limit  theorem  for  arrays  mentioned 
above/ provided  the  variances  of  the  terms  in  the  summand 
of  F^  become  arbitrarily  small  and  this  fellows  if 


sup  |  d  ^ >  n  ^  | — ►  0.  But  if  this  were  not  the  case,  d^_+^Y>0 
for  infinitely  many  i  and  hence,  since 


n . 


n~— »>oo  ,  Y  2>r  "2 


(d,)^n.  y 


contradicts  our  initial  assumption.  Further,  F^  either  has 
variance  <3  or  approaches  a  normal  random  variable  in  distri¬ 
bution  since  its  linear  part  is  normal  and  its  nonlinear  part 
has  variance  approaching  0.  Since  8  was  arbitrary,  J  >0  , 
and  F^  is  independent  of  F^  ;  h1  approaches  a  normal  random 
variable  in  distribution  and  we  obtain  the  asymptotic  error 


8 


formula  (2) . 


Finally  we  note  that,  in  (2) ,  we  could  replace  J  by  8B 
where  B  is  the  Bhattacharyya  distance.  This  follows  from 

83 

the  simply  verified  fact  that  -j-  — ►  1  as  all  — ►  1. 
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